Domain adaptation of statistical machine translation with domain-focused web crawling
نویسندگان
چکیده
منابع مشابه
Domain adaptation of statistical machine translation with domain-focused web crawling
In this paper, we tackle the problem of domain adaptation of statistical machine translation (SMT) by exploiting domain-specific data acquired by domain-focused crawling of text from the World Wide Web. We design and empirically evaluate a procedure for automatic acquisition of monolingual and parallel text and their exploitation for system training, tuning, and testing in a phrase-based SMT fr...
متن کاملImproved Domain Adaptation for Statistical Machine Translation
We present a simple and effective infrastructure for domain adaptation for statistical machine translation (MT). To build MT systems for different domains, it trains, tunes and deploys a single translation system that is capable of producing adapted domain translations and preserving the original generic accuracy at the same time. The approach unifies automatic domain detection and domain model...
متن کاملCombining Statistical Machine Translation and Translation Memories with Domain Adaptation
Since the emergence of translation memory software, translation companies and freelance translators have been accumulating translated text for various languages and domains. This data has the potential of being used for training domain-specific machine translation systems for corporate or even personal use. But while the resulting systems usually perform well in translating domain-specific lang...
متن کاملDomain Adaptation in Statistical Machine Translation with Mixture Modelling
Mixture modelling is a standard technique for density estimation, but its use in statistical machine translation (SMT) has just started to be explored. One of the main advantages of this technique is its capability to learn specific probability distributions that better fit subsets of the training dataset. This feature is even more important in SMT given the difficulties to translate polysemic ...
متن کاملDomain Adaptation in Statistical Machine Translation using Factored Translation Models
In recent years the performance of SMT increased in domains with enough training data. But under real-world conditions, it is often not possible to collect enough parallel data. We propose an approach to adapt an SMT system using small amounts of parallel in-domain data by introducing the corpus identifier (corpus id) as an additional target factor. Then we added features to model the generatio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Language Resources and Evaluation
سال: 2014
ISSN: 1574-020X,1574-0218
DOI: 10.1007/s10579-014-9282-3